Search CORE

8 research outputs found

Potential and limitations of cross-domain sentiment classification

Author: Cieliebak Mark
Deriu Jan Milan
von Grünigen Dirk
Weilenmann Martin
Publication venue: Association for Computational Linguistics
Publication date: 01/01/2017
Field of study

In this paper we investigate the cross-domain performance of sentiment analysis systems. For this purpose we train a convolutional neural network (CNN) on data from different domains and evaluate its performance on other domains. Furthermore, we evaluate the usefulness of combining a large amount of different smaller annotated corpora to a large corpus. Our results show that more sophisticated approaches are required to train a system that works equally well on various domains

Crossref

ZHAW digitalcollection

Twist Bytes : German dialect identification with data mining optimization

Author: Benites de Azevedo e Souza Fernando
Cieliebak Mark
Deriu Jan Milan
Grubenmann Ralf
von Däniken Pius
von Grünigen Dirk
Publication venue: VarDial
Publication date: 01/01/2018
Field of study

We describe our approaches used in the German Dialect Identification (GDI) task at the VarDial Evaluation Campaign 2018. The goal was to identify to which out of four dialects spoken in German speaking part of Switzerland a sentence belonged to. We adopted two different metaclassifier approaches and used some data mining insights to improve the preprocessing and the meta-classifier parameters. Especially, we focused on using different feature extraction methods and how to combine them, since they influenced the performance very differently of the system. Our system achieved second place out of 8 teams, with a macro averaged F-1 of 64.6%. We also participated on the surprise dialect task with a multi-label approach

ZHAW digitalcollection

A methodology for creating question answering corpora using inverse data annotation

Author: Agirre Eneko
Cieliebak Mark
Deriu Jan Milan
Kaiser Nicolas
Mlynchyk Katsiaryna
Rodrigo Alvaro
Schläpfer Philippe
Stockinger Kurt
von Grünigen Dirk
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

In this paper, we introduce a novel methodology to efficiently construct a corpus for question answering over structured data. For this, we introduce an intermediate representation that is based on the logical query plan in a database, called Operation Trees (OT). This representation allows us to invert the annotation process without loosing flexibility in the types of queries that we generate. Furthermore, it allows for fine-grained alignment of the tokens to the operations. Thus, we randomly generate OTs from a context free grammar and annotators just have to write the appropriate question and assign the tokens. We compare our corpus OTTA (Operation Trees and Token Assignment), a large semantic parsing corpus for evaluating natural language interfaces to databases, to Spider and LC-QuaD 2.0 and show that our methodology more than triples the annotation speed while maintaining the complexity of the queries. Finally, we train a state-of-the-art semantic parsing model on our data and show that our dataset is a challenging dataset and that the token alignment can be leveraged to significantly increase the performance

arXiv.org e-Print Archive

Crossref

ZHAW digitalcollection

spMMMP at GermEval 2018 Shared Task: Classification of Offensive Content in Tweets using Convolutional Neural Networks and Gated Recurrent Units

Author: Benites Fernando
Cieliebak Mark
Grubenmann Ralf
von Däniken Pius
von Grünigen Dirk
Publication venue: oeaw
Publication date: 02/10/2018
Field of study

In this paper, we propose two different systems for classifying offensive language in micro-blog messages from Twitter (”tweet”). The first system uses an ensemble of convolutional neural networks (CNN), whose outputs are then fed to a meta-classifier for the final prediction. The second system uses a combination of a CNN and a gated recurrent unit (GRU) together with a transfer-learning approach based on pretraining with a large, automatically translated dataset

Elektronisches Publikationsportal der Ãsterreichischen Akademie der Wissenschaften

spMMMP at GermEval 2018 Shared Task: Classification of Offensive Content in Tweets using Convolutional Neural Networks and Gated Recurrent Units

Author: Benites Fernando
Cieliebak Mark
Grubenmann Ralf
von Däniken Pius
von Grünigen Dirk
Publication venue: oeaw
Publication date: 02/10/2018
Field of study

Elektronisches Publikationsportal der Österreichischen Akademie der Wissenschaften

spMMMP at GermEval 2018 shared task : classification of offensive content in tweets using convolutional neural networks and gated recurrent units

Author: Benites de Azevedo e Souza Fernando
Cieliebak Mark
Grubenmann Ralf
von Däniken Pius
von Grünigen Dirk
Publication venue: ÖAW Austrian Academy of Sciences
Publication date: 01/01/2018
Field of study

In this paper, we propose two different systems for classifying offensive language in micro-blog messages from twitter (”tweet”). The first system uses an ensemble of convolutional neural networks (CNN), whose outputs are then fed to a meta-classifier for the final prediction. The second system uses a combination of a CNN and a gated recurrent unit (GRU) together with a transfer-learning approach based on pretraining with a large, automatically translated dataset

Elektronisches Publikationsportal der Ãsterreichischen Akademie der Wissenschaften

ZHAW digitalcollection

Elektronisches Publikationsportal der Österreichischen Akademie der Wissenschaften

Best practices in e-assessments with a special focus on cheating prevention

Author: Benites de Azevedo e Souza Fernando
Cieliebak Mark
Magid Amani
Pradarelli Beatrice
von Grünigen Dirk
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

In this digital age of the computer, Internet, and social media and Internet of Things, e-assessments have become an accepted method to determine if students have learned materials presented in a course. With acceptance of this electronic means of assessing students, many questions arise about this method. What should be the format of e-assessment? What amount of time? What kinds of questions should be asked (multiple choice, short answer, etc.)? These are only a few of the many different questions. In addition, educators have always had to contend with the possibility that some students might cheat on an examination. It is widely known that students are often times more technologically savvy than their professors. So how does one prevent students from cheating on an e-assessment? Understandably, given the amount of information available on e-assessments and the variety of formats to choose from, choosing to administer e-assessments over paper-based assessments can lead to confusion on the part of the professor. This paper presents helpful guidance for lecturers who want to introduce e-assessments in their class, and it provides recommendations about the technical infrastructure to implement to avoid students cheating. It is based on literature review, on an international survey that gathers insights and experiences from lecturers who are using e-assessment in their class, and on technological evaluation of e-assessment infrastructure

Crossref

ZHAW digitalcollection

Four different ways to build a chatbot about movies

Author: Benites de Azevedo e Souza Fernando
Cieliebak Mark
Deriu Jan Milan
Eich Walter
Graf Hans Daniel
Koc Yusuf
Neuhaus Stephan
Neureiter Nico
Panighetti Sandro
Stockinger Kurt
Togni Matteo
von Däniken Pius
von Grünigen Dirk
Weilenmann Martin
Xhoxhaj Erland
Zürrer Daniel
Publication venue
Publication date: 01/01/2017
Field of study

ZHAW digitalcollection